skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.
Attention:The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 7:00 AM ET to 7:30 AM ET on Friday, April 24 due to maintenance. We apologize for the inconvenience.


Search for: All records

Creators/Authors contains: "Tang, Wei Yu"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. RNA design aims to find a sequence that folds with the highest probability into a designated target structure. However, certain structures are undesignable, meaning no sequence can fold into the target structure under the default (Turner) RNA folding model. Understanding the specific local structures (i.e., “motifs”) that contribute to undesignability is crucial for refining RNA folding models and determining the limits of RNA designability. Despite its importance, this problem has received very little attention, and previous efforts are neither scalable nor interpretable. We develop a new theoretical framework for motif (un-)designability, and design scalable and interpretable algorithms to identify minimal undesignable motifs within a given RNA secondary structure. Our approach establishes motif undesignability by searching for rival motifs, rather than exhaustively enumerating all (partial) sequences that could potentially fold into the motif. Furthermore, we exploit rotational invariance in RNA structures to detect, group and reuse equivalent motifs and to construct a database of unique minimal undesignable motifs. To achieve that, we propose a loop-pair graph representation for motifs and a recursive graph isomorphism algorithm for motif equivalence. Our algorithms successfully identify 24 unique minimal undesignable motifs among 18 undesignable puzzles from the Eterna100 benchmark. Surprisingly, we also find over 350 unique minimal undesignable motifs and 663 undesignable native structures in the ArchiveII dataset, drawn from a diverse set of RNA families. 
    more » « less
  2. RNA design aims to find a sequence that can fold into a given (secondary) structure, which has wide applications in science and medicine. However, it has long been known that there are “undesignable structures” for which no sequence can fold into them according to the minimum free energy (MFE) criterion under the standard RNA folding energy model. Our previous work showed that undesignable structures can be effectively and efficiently identified by searching for rival structures. We further showed that there exist “minimal undesignable motifs” within those undesignable structures, where a (structural) motif is a set of consecutive loops and helices within a secondary structure. To better illustrate our theoretical findings, we built a motif server as a user-friendly visualization tool for undesignable RNA structures and motifs, as well as an interactive demo tool where the user can input a new structure and the server will compute and visualize any undesignable motifs within it on the fly. This server maintains a database of undesignable RNA structures and unique minimal undesignable RNA motifs, allowing the users to explore, visualize, analyze, and identify undesignable motifs in existing and new RNA structures. The importance of this server is that it provides a database of motifs for nanostructure design that should not be incorporated because these motifs are unlikely to be achievable. Availability: Our server is at https://linearfold.eecs.oregonstate.edu/motifserver. 
    more » « less
  3. RNA design aims to find a sequence that folds into a designated target structure under a specific RNA folding model, also known as the inverse problem of RNA folding. While numerous RNA design methods have been invented to search for sequences capable of folding into a target structure under the default (Turner) RNA folding model, little attention has been given to the identification of undesignable structures. This work bridges the gap between RNA design and undesignability by introducing a series of theorems and algorithms aimed at identifying both undesignable structures and their causative local structural components, which we define asminimal undesignable motifs. We first present theorems that provide sufficient conditions for recognizing undesignability structures and propose efficient, theorem-guided algorithms to verify whether an RNA structure is undesignable. While such global undesignability sheds light on the limits of RNA design, identifying the specific motifs responsible for undesignability is critical for understanding RNA folding models and advancing design methodologies. To this end, we develop a new theoretical framework for motif undesignability and propose scalable and interpretable algorithms to identify minimal undesignable motifs within a given RNA secondary structure. Our approach establishes motif undesignability by searching for rival motifs, rather than exhaustively enumerating all (partial) sequences that could potentially fold into the motif. Furthermore, we exploit rotational invariance in RNA structures to detect, group, and reuse equivalent motifs and to construct a database of unique minimal undesignable motifs. To achieve that, we propose a loop-pair graph representation for motifs and a recursive graph isomorphism algorithm for motif equivalence. Our algorithms successfully identified 24 unique minimal undesignable motifs among 18 undesignable puzzles from the Eterna100 benchmark. Surprisingly, we also find over 350 unique minimal undesignable motifs and 663 undesignable native structures in the ArchiveII dataset, drawn from a diverse set of RNA families. Last but not least, we demonstrate that our theory and algorithms can handle motifs with external loops—a critical advancement given the substantial impact of external loops on the quantity, diversity, and designability of RNA structure motifs. 
    more » « less
  4. RNA design aims to find a sequence that can fold into a target secondary structure. It can create artificial RNA molecules for specific functions, with wide applications in medicine. It is computationally challenging due to two levels of combinatorial explosion: the exponentially large design space and the exponentially many competing structures per design. Popular methods such as local search cannot keep up with these combinatorial explosions. We instead employ two techniques from machine learning, continuous optimization and Monte-Carlo sampling. We start from a distribution over all valid sequences, and use gradient descent to improve the expectation of an arbitrary objective function. We define novel coupled-variable distributions to model the correlation between nucleotides. We then use sampling to approximate the objective, estimate the gradient, and select the final candidate. Our work consistently outperforms state-of-the-art methods in key metrics including Boltzmann probability and ensemble defect, especially on long and hard-to-design structures. 
    more » « less
  5. RNA design aims to find a sequence that folds with the highest probability into a designated target structure. However, certain structures are undesignable, meaning no sequence can fold into the target structure under the default (Turner) RNA folding model. Understanding the specific local structures (i.e., “motifs”) that contribute to undesignability is crucial for refining RNA folding models and determining the limits of RNA designability. Despite its importance, this problem has received very little attention, and previous efforts are neither scalable nor interpretable. We develop a new theoretical framework for motif (un-)designability, and design scalable and interpretable algorithms to identify minimal undesignable motifs within a given RNA secondary structure. Our approach establishes motif undesignability by searching for rival motifs, rather than exhaustively enumerating all (partial) sequences that could potentially fold into the motif. Furthermore, we exploit rotational invariance in RNA structures to detect, group and reuse equivalent motifs and to construct a database of unique minimal undesignable motifs. To achieve that, we propose a loop-pair graph representation for motifs and a recursive graph isomorphism algorithm for motif equivalence. Our algorithms successfully identify 24 unique minimal undesignable motifs among 18 undesignable puzzles from the Eterna100 benchmark. Surprisingly, we also find over 350 unique minimal undesignable motifs and 663 undesignable native structures in the ArchiveII dataset, drawn from a diverse set of RNA families. 
    more » « less
  6. Abstract MotivationThe task of designing optimized messenger RNA (mRNA) sequences has received much attention in recent years, thanks to breakthroughs in mRNA vaccines during the COVID-19 pandemic. Because most previous work aimed to minimize the minimum free energy (MFE) of the mRNA in order to improve stability and protein expression, which only considers one particular structure per mRNA sequence, millions of alternative conformations in equilibrium are neglected. More importantly, we prefer an mRNA to populate multiple stable structures and be flexible among them during translation when the ribosome unwinds it. ResultsTherefore, we consider a new objective to minimize the ensemble free energy of an mRNA, which includes all possible structures in its Boltzmann ensemble. However, this new problem is much harder to solve than the original MFE optimization. To address the increased complexity of this problem, we introduce EnsembleDesign, a novel algorithm that employs continuous relaxation to optimize the expected ensemble free energy over a distribution of candidate sequences. EnsembleDesign extends both the lattice representation of the design space and the dynamic programming algorithm from LinearDesign to their probabilistic counterparts. Our algorithm consistently outperforms LinearDesign in terms of ensemble free energy, especially on long sequences. Interestingly, as byproducts, our designs also enjoy lower average unpaired probabilities (which correlates with degradation) and flatter Boltzmann ensembles (more flexibility between conformations). Availability and implementationOur code is available on: https://github.com/LinearFold/EnsembleDesign. 
    more » « less
  7. RNA design aims to find a sequence that folds with the highest probability into a designated target structure. However, certain structures are undesignable, meaning no sequence can fold into the target structure under the default (Turner) RNA folding model. Understanding the specific local structures (i.e., “motifs”) that contribute to undesignability is crucial for refining RNA folding models and determining the limits of RNA designability. Despite its importance, this problem has received very little attention, and previous efforts are neither scalable nor interpretable. We develop a new theoretical framework for motif (un-)designability, and design scalable and interpretable algorithms to identify minimal undesignable motifs within a given RNA secondary structure. Our approach establishes motif undesignability by searching for rival motifs, rather than exhaustively enumerating all (partial) sequences that could potentially fold into the motif. Furthermore, we exploit rotational invariance in RNA structures to detect, group and reuse equivalent motifs and to construct a database of unique minimal undesignable motifs. To achieve that, we propose a loop-pair graph representation for motifs and a recursive graph isomorphism algorithm for motif equivalence. Our algorithms successfully identify 24 unique minimal undesignable motifs among 18 undesignable puzzles from the Eterna100 benchmark. Surprisingly, we also find over 350 unique minimal undesignable motifs and 663 undesignable native structures in the ArchiveII dataset, drawn from a diverse set of RNA families. 
    more » « less
  8. RNA design is the search for a sequence or set of sequences that will fold into predefined structures, also known as the inverse problem of RNA folding. While numerous RNA design methods have been invented to find sequences capable of folding into a target structure, little attention has been given to the identification of undesignable structures according to the minimum free energy ( ) criterion under the Turner model. In this paper, we address this gap by first introducing mathematical theorems outlining sufficient conditions for recognizing undesignable structures, then proposing efficient algorithms, guided by these theorems, to verify the undesignability of RNA structures. Through the application of these theorems and algorithms to the Eterna100 puzzles, we demonstrate the ability to efficiently establish that 15 of the puzzles indeed fall within the category of undesignable structures. In addition, we provide specific insights from the study of undesignability, in the hope that it will enable more understanding of RNA folding and RNA design. Availability: Our source code is available at https://github.com/shanry/RNA-Undesign. 
    more » « less